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ABSTRACT 

In this paper we present a technique to detect masses from digital mammograms using Artificial Neural Network 
(ANN), which performs malignant-normal classification on region of interest (ROI) that contains mass. The major 
mammographic characteristics for mass classification are Intensity, Shape and Texture. ANN exploits all such type of 
important factor to classify the mass into malignant or normal. The features used in characterizing the masses are mean, 
standard deviation, skewness, area, perimeter, homogeneity, energy, contrast and entropy. The main aim of the method is 
to increase the effectiveness and accuracy of the classification process in an objective manner to reduce the numbers of 
false-positive of malignancies. ANN with nine features was proposed for classifying the marked regions into malignant and 
normal. With ANN classifier, experiment result shows the 96.875% accuracy, 96.551% sensitivity and 97.142% 
specificity. 
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INTRODUCTION 

The incidence of breast cancer is low in India, but rising. Breast cancer is the commonest cancer of urban Indian 
women and the second commonest in the rural women. Owing to the lack of awareness to this disease and in absence of a 
breast cancer screening program. A recent study of breast cancer risk in India revealed that 1 in 28 women develop breast 
cancer during her life time [1]. This is higher in urban areas being in 1 in 22 in a lifetime compared to the rural areas where 
this risk is relatively much lower being 1 in 60 women developing breast cancer in their lifetime. In India the average age 
of the high risk group in India is 43-46 years unlike in the west where women aged 53-57 years are more prone to breast 
cancer. 

A report estimated that one in eight women in the U.S. and one in thirteen in Australia develops breast cancer 
during their life time. Breast cancer continues to be significant public health problem among women around the world. 
It has become the number one cause of 

Cancer deaths amongst Malaysian women. In the European Community, breast cancer represents 19% of cancer 
deaths and the 24% of all cancer cases. Nearly 25% of all breast cancer deaths occur in women diagnosed between ages 40 
and 49 years. 

In order to reduce morbidity and mortality, early detection of breast cancer is essential. However, the appearances 
of breast cancer are very subtle and unstable in their early stages. Therefore, doctors and radiologists can miss the 
abnormality easily if they only diagnose by experience. The mammography technology can help doctors and radiologists in 
getting a more reliable and effective diagnosis. Since it checks mammograms as the "second reader", thus giving to doctors 
and radiologist a favorable advice. 

Digital mammography is the best available examination for the detection of early signs of breast cancer and it can 
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reveal pronounced evidence of abnormality such as masses and calcifications. Like a standard mammogram, a digital 
mammogram uses x-rays to produce an image of the breast. The differences are in the way the image is recorded, viewed 
by the doctor, and stored. Standard mammograms are recorded on large sheet of photographic film. Digital mammograms 
are recorded and stored on a computer. After the exam, the doctors can view them on a computer screen and adjust the 
image size, brightness, or contrast to see certain areas more clearly. Digital images can also be sent electronically to 
another site for a consultation with breast specialists. While the digital option is not available at all centres, it is becoming 
more widely available. 

In this paper automatic mass classification into malignant and normal is presented based on the statistical and 
textural features extracted from mass from the breast region using ANN. This paper is organized as follows. Section2 
briefly reviews some existing techniques for mass classification followed by artificial neural network (ANN) in section 3. 
Statistical and texture features are described in section 4. section 5 describes the proposed methods for mass classification. 
Section 6 demonstrates some simulation results and their performance evaluation, finally conclusion are presented in 
section 7. 

LITERATURE SURVEY 

Breast cancer is the most common cancer and continues to be a significant public health problem among women 
around the world. Medical imaging systems are constantly improving in image quality because of increased image 
resolution. This results in a growing number of images that have to be inspected for diagnosis. Only the early detection and 
diagnosis is the way of control but it is a major challenge in India due to lack of awareness and lethargy of Indian women 
towards the health care and regular check-up. Detection of abnormal masses within breast as well as breast image 
segmentation is a very important feature in image analysis. Radiologists interpret the mammogram images for detect the 
abnormalities of cancerous cells such as clustered micro -calcifications (MCCs), masses, architectural distortion 
, asymmetry between breasts, breast edema and lymphadenopathy. Then, they will diagnose the abnormalities to determine 
the status of breast cancer whether it is benign or malignant. In recent years, a few researchers in either academia or 
industry have used different approaches to do the classification of masses. 

Jawed Nagi et.al in [8] developed an automated technique for mammogram segmentation. The proposed 
algorithm using morphological preprocessing and seeded region growing (SRG) to remove digitization noises, suppress 
radiopaque artifacts and remove the pectoral muscle to accentuate the breast profile region for use in CAD algorithms. 

Jelena Bozek et.al in [9] described a computer-aided detection and diagnosis of breast abnormalities in digital 
Mammography. Masses calcifications, architectural distortion and bilateral asymmetry are defined with wide range of 
features and can indicate malignant changes but can also be a part of benign changes. Most of the features such as shape, 
margin distribution size etc. can be detected by using developed algorithms. However, there are some problems in 
detection and diagnose of breast abnormalities specific for particular lesion. Some of the problems are visibility of lesion, 
possibility to differ it from surrounding tissue and appropriate classification of the change as malignant or benign. 

Nawazish Naveed et.al in [10] has proposed a malignancy and abnormality detection of mammograms using 
DWT features and ensembling of classifiers. The main complexity about digital mammogram diagnosis is the detection of 
malignant images and its classification on the basis of abnormalities present. Author investigated the accuracy of detection 
methodology that uses DWT features as an input to different classifiers like K-nearest neighbor (KNN), Artificial neural 
networks (ANN) and Support Vector Machine (SVM) and ensemble the results generated by these classifiers. Next, the 
malignant images are passed through a bank of these ensemble classifiers which are again trained for classification of 
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different abnormalities. One against all approaches is used for multi-classification. Each ensemble classifier is trained for 
one abnormality. That particular classifier assigns probability to the abnormality for which it is trained. Median, Mean and 
product rules are used to combine the result of binary classifiers. 

A mass lesion detection using wavelet decomposition transform and support vector machine has been proposed by 
Ayman Abu Baker et.al in [11]. The proposed method is designed using three main stages, detection region of interest, 
extraction wavelet features and support vector machine (SVM). In detection region of interest the morphological 
processing, object labeling, and size filtering are implemented. The main purpose for this technique is to study the 
properties of true positive (TP) and false positive (FP) detected regions in the mammogram images by analyzing their 
wavelet features and support vector machine (SVM). The combination of wavelet feature and support vector machine 
(SVM) has been used to reduce number of the detected FP regions. 

Nevine H. Eltonsy et.al in [12] developed a concentric morphology model for the detection of masses in 
mammography. The technique is based on the presence of concentric layers surrounding a focal area with suspicious 
morphological characteristics and low relative incidence in the breast region. Mammographic locations with high 
concentration of concentric layers with progressively lower average intensity are considered suspicious deviations from 
normal parenchyma. Morphologic concentric layer analysis is a promising strategy for screening mammograms to identify 
locations highly suspicious to contain malignant masses while maintain the detection rate of benign masses significantly 
lower. 

Byung-Woo Hong et.al in [13] has proposed a segmentation of regions of interest in mammograms topographic 
approach. A topographic representation has been developed using isolevel contours. The topological and geometrical 
relationships between contours are analyzed using the inclusion tree. A breast coordinate system can be stabilized after 
segmentation of the breast boundary and the pectoral muscle. This coordinate system may provide useful information for 
the identification of masses and registration of two mammograms. A topographic representation is largely invariant to 
brightness and contrast, and it provides a robust and efficient representation for the characterization of mammographic 
features. 

Shih-Chung B.Lo et.al in [14] has proposed a multiple circular path convolution neural network system for 
detection of mammographic masses. Multiple circular path convolution neural network architecture specifically designed 
for the analysis of tumor and tumor -like structure has been constructed. Author first divided each suspected tumor area into 
sectors and computed the defined mass features for each sector independently. These sector features were used on the input 
layer and were coordinated by convolution kernels of different sizes that propagated signals to the second layer in the 
neural network system. The MCPCNN is capable of analyzing correlated features within the sector and between adjacent 
sectors, which led to an improvement in detecting mammographic masses. 

Weidong Xu et.al in [15] described a new ANN-based detection algorithm of the masses in digital mammograms. 
It firstly built up two mass models to represent the masses with different backgrounds and features, and used different 
detection methods on different type of masses: for those masses inside the fatty tissue, iterative thresholding was applied to 
locate them; for those masses in the denser tissue, black hole registration based on discrete wavelet transform (DWT) were 
used instead. Then, filling dilation was used to extract the whole masses from the background, which was adjusted 
adaptively by ANFIS. 

Pradeep N et.al in [16] described the method for feature extraction of mammograms. Pattern recognition in image 
processing requires the extraction of features from ROI of the image, the processing of these features with a pattern 
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recognition algorithm. Features are nothing but observable patterns in the image which gives some information about the 
image. For every pattern classification problem, the most important stage is feature extraction. The accuracy of the 
classification depends on the feature extraction stage. The different features that can be extracted for a digital mammogram 
are: Texture features, Statistical feature, and Structure feature. 

loan Buciu et.al in [17] has given directional features for automatic tumor classification of mammogram images. 
Patches around tumors are manually extracted to segment the abnormal areas from the remaining of the image, considered 
as background. The mammogram images are filtered using Gabor wavelets and directional features are extracted at 
different orientation and frequencies. Principal Component Analysis is employed to reduce the dimension of filtered and 
unfiltered high-dimensional data. Support Vector Machine are used to final classify the data. The robustness of Gabor 
features for digital mammogram images distorted by the Poisson noise with different intensity levels is also addressed. 

M. Sundaram et.al in [18] proposed a method of histogram modified local contrast enhancement for mammogram 
images. In this method, author adjust the level of contrast enhancement, which in turn gives the resultant image a strong 
contrast and also brings the local details present in the original image for more relevant interpretation. It incorporates a two 
stage processing both histogram modifications as an optimization technique and a local contrast enhancement technique. 
The performance of this method is determined using three parameters like Enhancement Measure (EME), Absolute Mean 
Brightness Error (AMBE) and Discrete Entropy (H) for all 22 numbers of Mias mammogram images with 
microcalcification. Its enhancement potential is also tested by sobel and otsu methods for the detection of 
microcalcification in the mammogram image. 

ARTIFICIAL NEURAL NETWORK 

Artificial Neural Network (ANN) is a powerful classifier that representfs input/output relationships. It resembles 
human brain in acquiring knowledge through learning and storing knowledge within inter-neuron connection strengths. 
ANN's synaptic weights are adjusted or trained so that a particular input lead to specific desired or target output. Figure 1 
shows the block diagram for supervised learning ANN, where the network is adjusted based on comparing neural network 
output to the desired output until the network output matches the desired output. Once the network is trained it can be used 
to test new input data using the weights provided from the training session. 
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Figure 1: Supervised Learning of ANN 



STATISTICAL AND TEXTURE FEATURES 

The major mammographic characteristics for mass classification are Intensity, Shape and Texture. Statistical and 
texture features are extracted for each ROI. The extracted features are then used in neural network classifier to train it for 
the recognition of a particular ROI of similar nature. These features are mean, standard deviation, skewness, area, 
perimeter, homogeneity, energy, contrast and entropy. These are adopted from [10, 15, 16]. 
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Mean Value 

The mean is also known as average gray level of pixel of pixels in ROI. The mean estimates the value in the 
image in which central clustering occurs. The mean can be calculated using the formula: 

M N 

1=1 i=l (1) 

Where p(i,j), is the pixel value at point (ij) of an image of size MxN. 
Standard Deviation 

The Standard Deviation, o is the estimate of the mean square deviation of grey pixel value p (ij) its mean value 
(u). Standard deviation describes the dispersion with in a local region. It is determined using the formula: 



M N 



i = L j = l 

(2) 



Skewness 



Skewness, S characterizes the degree of asymmetry of pixel distribution in the specified window or ROI around 
its mean. Skewness is a pure number that characterizes only the shape of distribution. The formula for finding Skewness is 
given in the below equation: 

s= 1 yy M^-^ 

1=1 > =1 (3) 

Area 

This is equal to the sum of all the pixels covered by the ROI. That is, area of the ROI in a digital mammogram 
image is number of pixels in the ROI. Thus we can compute the area of the ROI by simply given formula below: 

A = Total number of pixels ^ 
Perimeter 

The perimeter (P) is equal to the sum of side the side lengths. 
P = / side lenth.5 

^ (5) 

Homogeneity 

Homogeneity is defined using gray-level co-occurrence matrix as given below: 

Homogeneity = ^ ^ -j 

U +,_ ' (6) 
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Energy 

Energy is the sum of squared elements in the Gray Level Co-occurrence Matrix (GLCM). Energy is also known as 
uniformity. The range of energy is [0 1]. Energy is lfor constant image. The formula for finding energy is given below 
equation: 

U (7) 

Contrast 

Contrast is a measure of the intensity contrast between a pixel and its neighbor over the whole image. Contrast is 
calculated by using the equation given below: 

c = Jii-ji 2 pai> 

U (8) 

Entropy 

Entropy is a statistical measure of randomness that can be used to characterize the texture of the input image. 
Entropy, H can also be used to describe the distribution variation in a region. Overall Entropy of the image can be 
calculated as: 

L-i 

k=D (9) 

where, Pr is the probability of the kth grey level, which can be calculated as Zk/m*n, Zk is the total number of 
pixels with the kth grey level and L is the total number of grey levels. 

PROPOSED METHODS 

In order to overcome the problems of various existing techniques for sensitivity and accuracy, performance of 
detection of abnormal masses from mammographic images, the attainment of following objectives are required a method of 
detection of abnormal masses in digital mammogram to give high accuracy, high sensitivity, low rate of false positive and 
false negative, increased true positive rate. 

As per the above mentioned objectives, to implement a new method for evaluating performance of 
mammographic images the following steps are to be performed: 

• Firstly to obtain the data from Mammographic Image Analysis Society (MIAS) database. 

• Apply the image enhancement technique such as histogram equalization on input images. 

• Then, segment the image for region of interest (ROI). 

• Next, extracting 9 features from ROI such as intensity, shape, and texture features. 

• Next, feed the features to feed-forward neural network. 

• Finally, classify and decide whether the input mammogram image is malignant or normal image. 
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Figure 2: System for Mass Detection 

SIMULATION RESULTS AND PERFORMANCE EVALUATION 
Image Database 

To develop and evaluate the proposed system we used the Mammographic Image Analysis Society (MiniMIAS) 
[16] database. It is an organization of UK research group. Films were taken from UK National Breast Screening 
Programme that includes radiologist's "truth" marking on the locations of any abnormalities that may be present. Images 
are available online at the Pilot European Images Processing Archive (PEIPA) at the University of Essex. This database 
contains left and right breast images for a total of 161 (322 images) patients with ages between 50 and 65. All images are 
digitized at a resolution of 1024 x 1024 pixels and at 8-bit gray scale level. The existing data in the collection consists of 
the location of the abnormality (like the centre of a circle surrounding the tumor), its radius, breast position (left or right), 
type of breast tissue 9fatty, fatty-glandular and dense) and tumor type if it exists (benign or malign). Each of the 
abnormalities has been diagnosed and confirmed by a biopsy to indicate its severity. In this database, 42 images contain 
abnormalities (malignant masses) and 106 images are classed as normal and rest of them either contains microcalcification 
or benign. 

Database for Experiment 

In this experiment, Mammography Image Analysis Society (MIAS) database is used with 64 mammograms 
including 29 malignant mammograms, and 35 normal mammograms. 

For classification stage, divide the database into training set and testing set. 

Malignant Has 29 Mammograms (15 for Training / 29 for Testing) 

G - CIRC: 1 for training/ 1 for testing. 
F - CIRC: 1 for training/ 2 for testing. 
G - ASYM: 1 for training/ 2 for testing. 
D - ASYM: 2 for training/ 2 for testing. 
F - ASYM: for training/ 2 for testing. 
G - ARCH: 2 for training/ 3 for testing. 
F - ARCH: 1 for training/ 2 for testing. 
D - ARCH: 2 for training/ 4 for testing. 
F - SPIC: 2 for training/ 2 for testing. 
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G - SPIC: for training/ 3 for testing. 

D - SPIC: for training/ 1 for testing. 

G - CALC: 1 for training/ 1 for testing. 

D - CALC: 1 for training/ 1 for testing. 

F - CALC: for training/ 1 for testing. 

F - MISC: 1 for training/ 1 for testing. 

D - MISC: for training/ 1 for testing. 
Normal Has 35 Mammograms (17 for Training/ 35 for Testing) 

F - NORM: 17 for training/ 27 for testing. 

D - NORM: for training/ 4 for testing. 

G - NORM: for training/ 4 for testing. 
Results and Performance 

Input images are taken from Mammography Image Analysis Society (MIAS) database. These images have some 
noises. Before processing of these images, noises are removed. So image enhancement technique, used histogram 
equalization method for enhancing the images. After then, segmentation technique is required for extracting the region of 
interest (ROI) from the mammogram images. Next, extraction of the features such as area, average gray level (mean), 
standard deviation, skewness, perimeter, homogeneity, energy, contrast and entropy from the selected ROI of the 
mammogram image is required. Next, trained the feed-forward neural network with the help of these above mentioned 
extracted features. This neural network has one input, two hidden layer, and one output. For mass classification neural 
network target is set to 1 or value. In this design methodology, consider the malignant or normal case of breast cancer. 
For mass classification, neural network's output give the value 1 or malignant mass and value for normal mass. 

Histogram Equalization 

The histogram of a digital image with gray levels in the range [0, L-l] is a discrete function 

g(r^) = Q](, where is the kth gray level and ti^ is the number of pixels in the image having gray level . 

LI II 
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Figure 3: Malignant (mdb 184.pgm) MIAS Database (a) Original Mammogram Image; 
(b) Histogram Equalization Image (Enhanced Image); (c) before Histogram Equalization 
Distribution Plot, (d) After Histogram Equalization Distribution Plot 
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Table 1: Cumulative Histogram Distribution for Malignant (mdb!84) Case 
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Figure 4: Normal (mdb 140.pgm) MIAS Database (a) Original Image; (b) Histogram 
Equalization Image (Enhanced Image); (c) before Histogram Equalization 
Distribution Plot, (d) after Histogram Equalization Distribution Plot 
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Table 2: Cumulative Histogram Distribution for Normal (mdb!40) Case 
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Figure 5: (a) Segmented Result (ROI) of Malignant Mammogram; 
(b) Segmented Result (ROI) of Normal Mammogram 

Result of Mass Detection 

Nine parameters i.e. area, mean, standard deviation, skewness, perimeter, homogeneity, contrast, energy and 
entropy are taken for trained the ANN. Finally, result of mass detection from digital mammogram, have the table of all 
mammograms as follow: 



Table 3: Output Result of ANN 





Total Number 
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False 


Malignant 


29 


28 
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35 


34 


1 
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TP: Predicts malignant as malignant 
FN: Predicts malignant as normal. 
Performance Evaluation 

TP + TN 



TN: Predicts normal as normal. 
FP: Predicts normal as malignant. 
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Figure 6: Simulation Results: (a) Performance Plot; (b) Training State Plot; (c) Regression Plot 
CONCLUSIONS 

Mass classification is a vital stage for the performance of the computer aided breast cancer detection. 
Different classifiers were used in biomedical imaging application like breast cancer detection from mammogram. 
However, ANN shows very good performance in medical diagnostic systems. In this paper, before processing, 
the enhancement image has been taken from histogram equalization technique. Then, segmentation technique is used to 
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extract the region of interest (ROI). ROI is extracted using peak analysis from the histogram of the breast tissue. 
Therefore, also get the exact boundaries of suspicious regions, and it is now convenient to obtain good shape feature for 
classification. In this paper, the proposed features are good descriptions especially for speculated masses. With artificial 
neural network (ANN) classifier, experiment result shows that the accuracy of this method is good i.e. 96.875%, because it 
have low false positive and false negative rate. Furthermore, the True Positive detection rate of this methodology is good 
for a data set 64 mammograms. Moreover, proposed this method is simple and it takes less time for iterations. Therefore, 
it is effective in terms of time consuming and precision. 
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